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LOW COMPLEXITY VIDEO DECODING 

BACKGROUND OF THE INVENTION 

Technical Field of the Invention 

The present invention relates generally to the field 
of signal processing, and, more particularly, to a method 
and apparatus for decoding a compressed video signal for 
use by another unit having a lower resolution, or 
alternatively, an equal or higher resolution than the 
compressed video signal . 

10 Description of the Prior Art 

Video image signals representative of video pictures 

are often processed at a first location (transmitter 

location) to encode the video image signals into a 
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compressed video bit stream. The encoded bit stream may 
then be transmitted from the first location to a second 
location (receiver location) where the received bit 
stream is decoded for displaying the video pictures, 
5 processing, or storing the pixel values for later 

retrieval at the receiver location. The receiver 
location may, for example, process the decoded bit stream 

Q 

hQ to code with a new compression format, or display the 

i;q 

H= video pictures on a monitor or other display unit. 

Q 10 Video image signals may be displayed using a variety 

t,y of video formats, such as common intermediate format 

□ (CIF) and quarter common intermediate format (QCIF) . CIF 

Ly 

fy specifies a data rate of 3 0 frames per second (fps) , with 

|;n 

p each frame containing 288 lines and 352 pixels per line 

15 (352 * 288) . QCIF, a related standard, also specifies a 

data rate of 3 0 fps, however, each frame contains only 
144 lines and 176 pixels per line (176 * 144) . QCIF is 
therefore one-fourth the resolution of CIF. Several 
other formats exist, e.g. PGA and MPEG, which provide a 
20 multiplicity of resolutions available for displaying, 

storing, processing, etc. a video signal. 
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It sometimes occurs that the unit for storing, 
processing or displaying at the receiver location has a 
different resolution than that of the compressed video 
signal to which the bit stream corresponds. For example, 
5 the bit stream may correspond to a CIF picture 

resolution, whereas the unit for displaying, storing, or 
processing at the receiver location might use a QCIF 
resolution. This resolution difference necessitates that 

m 

a downscaling procedure be carried out at the . receiver 

Q 10 location to permit the display unit to properly display 

tip 

lU the lower resolution picture. 

t: 

1:3 FIGURE 2 schematically illustrates a video decoding 

fy procedure that is known in the prior art and that may be 

.... . 
-q carried out m receiver processing circuitry. Basically, 

15 the procedure includes first decoding the compressed 

video bit stream corresponding to, for example, CIF 
resolution, and then downscaling the decoded signal in 
order to, for example, display the image on a monitor 
that uses a different resolution than the compressed 

20 video bit stream. More particularly, the compressed 

video bit stream 121 is decoded by first passing the 
signal through an inverse discrete cosine transform 

3 

Dallas2 712542 v 2, 34645. 00507USPT 




Patent Application 
Docket 34645-00507USPT 

(IDCT) 126. Then the prediction block 128 provides 
motion compensation by applying the motion vectors to the 
previous compressed video bit stream to form a 
reconstructed image. After decoding, the image is 
downscaled to produce the lower resolution image. The 
image is passed through a low-pass filter (not 
specifically shown) , followed by a sub-sampling block 124 
which sub- samples the image to produce the lower 
resolution picture which can be stored, processed, or 
displayed . 

In the system illustrated in FIGURE 2 the signal 
received by the receiver apparatus is first decoded with 
full resolution. A downscaling process is then performed 
so that the picture will fit into the low resolution 
display of the display unit. Decoding with full 
resolution and then downscaling is a complex process 
which is quite demanding of both memory and CPU capacity 
in the receiver apparatus. 

2 0 SUMMARY OF THE INVENTION 

The present invention provides an improved method 
and apparatus for processing a compressed video bit 
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stream which corresponds to a first picture resolution so 
that the picture may be properly displayed, stored, or 
processed by a unit having a second resolution. 

More particularly, when the ' second resolution is 
5 lower than the first resolution, the present invention 

includes the steps of downscaling the compressed video 
bit stream, and thereafter decoding the downscaled 
HQ compressed video bit stream to provide the video signal 

U having the second resolution. 

1:0 

D 10 The present invention also provides a method for 

l : y displaying a video signal on a display unit with an equal 

q or higher resolution than that of the compressed video 

j'jj signal. In this case, the video signal is displayed on 

JlLj a portion of the display unit. 

|!=a 15 In accordance with the present invention, 

downscaling of the compressed video bit stream is carried 
out before the bit stream is decoded. This considerably 
decreases decoding complexity, and requires less memory 
and lower CPU power usage than in the prior art. 
20 According to the presently preferred embodiment of 

the invention, the downscaling step comprises removing 
high frequency discrete cosine transform (DCT) components 
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of the bit stream. The subsequent decoding step utilizes 
a novel decoding algorithm having a modified Inverse DCT 
and a modified prediction block. The decoding algorithm 
requires less memory and fewer calculations than prior 
5 art techniques, and produces a picture quality which is 

almost imperceptible from the prior art method. 

Further advantages and specific details of the 
invention will become apparent hereinafter in conjunction 
with the following detailed description of presently 
10 preferred embodiments. 

BRIEF DESCRIPTION OF THE DRAWINGS 

A more complete understanding of the method and 
apparatus of the present invention may be obtained by 
15 reference to the following Detailed Description when 

taken in conjunction with the accompanying Drawings 
wherein : 

FIGURE 1 schematically illustrates an overall system 
for processing video image data to assist in explaining 
2 0 the invention; 

FIGURE 2 schematically illustrates a known decoding 
procedure for downscaling a video image signal; 
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FIGURE 3 schematically illustrates a decoding 
procedure for downscaling a video image signal according 
to a presently preferred embodiment of the invention; and 

FIGURE 4 is a flow chart illustrating the video 
decoding method according to a presently preferred 
embodiment of the invention . 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

The present invention will now be described more 
fully hereinafter with reference to the accompanying 
drawings, in which preferred embodiments of the invention 
are shown. This invention may, however, be embodied in 
many different forms and should not be construed as 
limited to the embodiments set forth herein; rather, 
these embodiments are provided so that this disclosure 
will be thorough and complete, and will fully convey the 
scope of the invention to those skilled in the art. 

FIGURE 1 schematically illustrates an overall system 
for processing video image data to illustrate an 
environment within which the video decoding method and 
apparatus of the present invention may be utilized. The 
system is generally designated by reference number 100 
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and includes a transmitter apparatus 102 and a receiver 
apparatus 104. The transmitter apparatus 102 is at a 
transmitter location and is adapted to receive an analog 
or digital video image signal 110 from a video source 
5 108. Video source 108 may be any video source such as a 

video camera, a VCR, a DVD player, or any similar 
apparatus that generates analog or digital video image 
signals. The video source 108 may also be a video cable, 
an antenna, or any other device that receives analog or 

10 digital video image signals from a remote source. 

Transmitter apparatus 102 includes suitable 
processing circuitry 103 which converts the video image 
signal 110 to a compressed video bit stream which 
corresponds to the video image signal 110 utilizing 

15 encoding techniques which are well-known to those skilled 

in the art, and thus need not be described herein. The 
transmitter apparatus 102 next transmits the compressed 
video bit stream to the receiver apparatus 104 via any 
suitable transmission path 105. As is also well-known in 

20 the art, the encoding techniques, such as DCT encoding, 

typically include applying appropriate compression 
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techniques to the signal so as to reduce the amount of 
data used to represent the information in an image. 



circuitry 107 processes the received compressed video bit 
stream. The receiver processing circuitry 107 converts 
the compressed video bit stream back to an analog or 
digital video image signal 111 which is delivered to a 
unit 112 such as a monitor, signal processing unit, or 
storage unit which displays, processes, or stores the 
picture represented by the signal. 

Sometimes, the unit 112 at the receiver location has 
a lower resolution than the resolution of the image to 
which the received bit stream corresponds. For example, 
the compressed video bit stream may correspond to a CIF 
resolution whereas the unit 112 might use, for example, 
a QCIF resolution. This difference in resolution 
necessitates that a downscaling procedure be performed at 
the receiver apparatus to permit the display unit to 
properly display the image. 

Alternatively, the display unit 112 at the receiver 
location may have an equal or higher resolution than the 
received bit stream image resolution. In this case, the 



At the receiver apparatus 104, 



the processing 
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video signal is not displayed on the entire display unit 
112, but only a portion of it. 

In the present invention, the downscaling operation 
is performed at the bit stream level, before the decoding 
step, and this significantly decreases decoding 
complexity and reduces memory requirements. 

The decoding procedure according to the present 
invention is schematically illustrated in FIGURE 3. As 
shown in FIGURE 3, the compressed bit stream on line 121 
received by the processing circuitry 107 of the receiver 
apparatus 104 is first downscaled and is then decoded. 
The downscaling is illustrated by block 132 and involves 
the removal of DCT components. Thereafter, the signal is 
decoded by a video decoder loop 134 . The video decoder 
loop uses a modified inverse transform 136 and a modified 
predictor 138, which will be described more fully below. 

In a presently preferred embodiment, the unmodified 
bit stream uses 8*8 DCT blocks. The downscaling block 
132 in FIGURE 3 involves discarding the high frequency 
components such that the modified block size is n * n, 
where k <=n<= 8. The modified inverse transform (MIT) is 
assumed to produce k * k pixel (pel) values, and, as a 
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first approximation, the complexity of the modified 
decoding loop becomes K 2 /64 . Table 1 below illustrates 



the resulting picture resolution and corresponding 



complexity for different values of k if the unmodified 



5 bit stream uses CIF (352 * 288) . 



K 


resolution 


complexity 


1 


44*36 


0 . 015 


2 


88*72 


0 . 06 


3 


132*108 


0 . 14 


4 


176*144 


0 . 25 


5 


220*180 


0 .39 


6 


264*216 


0 . 56 


7 


308*252 


0 . 77 


8 


352*288 


1 



Table 1 



The modified inverse transform is designed without 

any significant picture quality loss for still picture 

decoding, and is readily apparent to those skilled in the 

20 art. The initial bitstream is organized into a number of 

DCT block with coefficients representing 8*8 pixel 

blocks. The modified IDCT is then used to produce k * k 

pixels in each block by using n * n coefficients, where 

k<=n<=8. Examples of such matrices using n=k as an 
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example are listed in Table 2 below where the basis 
functions are seen as columns. The floating point 
numbers can be easily approximated by integer numbers to 
give limited resolution arithmetic. 
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Inverse 


Transform 
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0. 


48 


0 


43 


0 


35 


0 


27 


0 


18 
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15 



Table 2 

The modified predictor MP needs to take several aspects 
into account : 
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1. Scaling of the Motion Vector. Originally the 
motion vector has a resolution of M pels. The modified 
motion will have a resolution of 

k l k 

— * — or — 

8 2 16 

5 2. If the non-modified motion vector specifies 

using full pixels (full-pel) , no blurring occurs in the 
prediction process. Therefore, the scaled motion 
compensation, which might be sub-pel, shall have as 
little lowpass filtering as possible. This would 
10 theoretically be implemented with linear-phase allpass 

;:J* filters which do not exist; and is, in practice, 

£H implemented by so-called spline-interpolating filters. 

□ 

Experiments have shown that 4 -tap filters are sufficient 
(see Table 3a) . 

15 

3 . If the non modified motion vector is specified 

in half pixels (half-pel) , blurring will occur in the 

prediction process. Accordingly, blurring will also 

occur in the scaled prediction. For k=7, tests show that 

20 bilinear blur is okay (see Table 3c) and that for k=6, 

more care is needed. If both horizontal and vertical 
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motion vector is half -pel, bilinear blur is used. For 
all other cases, 4 -tap filters with limited blur is best 
(see Table 3b) . These limited blur filters are 
essentially a compromise between allpass and bilinear 
5 filters. 



ru 



(!=> 
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15 



Scaled 


a . 


spline 


-like 


b. 


compromise 


c . 


bi-linear 




irtv 
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0 


256 


0 0 


0 


256 


0 


0 


0 


256 


0 


0 


1/16 


-7 


251 


14 -2 


-3 


244 


16 


-1 


0 


240 


16 


0 


1/16 


-12 


243 


30 -5 


-6 


232 


32 


-2 


0 


224 


32 


0 


3/16 


-16 


232 


48 -8 


-8 


220 


■ 48 


-4 


0 


208 


48 


0 


4/16 


-18 


218 


66-10 


-9 


204 


66 


-5 


0 


192 


64 


0 


5/16 


-20 


203 


86-13 


-10 


186 


86 


-6 


0 


176 


80 


0 


6/16 


-21 


186 


107-16 


-10 


170 


104 


-8 


0 


160 


96 


0 


7/16 


-20 


167 


127-18 


-10 


154 


121 


-9 


0 


144 


112 


0 


8/16 


-19 


147 


147-19 


-9 


137 


137 


-9 


0 


128 


128 


0 



20 



25 



Table 3 



// assume full -pel in non- 



C-code for filter selection 

Hor- f ilter=ver_f ilter=a ; 
modified case 
if (k==7) 

[if (mv_hor is half -pel) hor_filter = c; 
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if (mv_ver is half -pel) ver_filter = c;] 
else if (k==6) 

[if (mv_hor && mv__ver are both half-pel) 
hor_f ilter=ver_f ilter=c ; 
5 else if (mv_hor is half-pel) hor_f ilter=b ; 

else if (mv_ver is half -pel) ver_f ilter=b; ] 

else if (k>2) 

[if (mv_hor is half-pel) hor_f ilter=b; 

if (mv_ver is half -pel) ver_f ilter=b ; ] 

10 

4. The non-modif ied prediction process uses 
rounding in the half- pel interpolation. Rounding can be 
either upwards or downwards depending on pixel values. 
In the scaled case, the rounding must correspond to avoid 

15 drift. The following method guarantees the same 

probability for up-rounding with respect to down-rounding 
to minimize long term drift. In the below description, 
r normally has the value zero (0) . However, in some 
cases it can have the value one (1) . For example, in 

20 MPEG-4 and H.263 it is possible to transmit the value of 

r as side information. 
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Non -modi f i ed rounding 

If (mv_hor && mv__ver are both half -pel) 

p= (a+b+c+d+2-r) //4 ; 
Else if (mv_hor || mv__ver is half_pel) 
5 p= (a+b+l-r) //2; 

Modified rounding 

If (mv_hor && mv_ver are both half-pel) R=256*(10- 
4r) ; 

Else if (mv_hor || mv_ver is half_pel) R=256* (12-8r) ; 
Else R=256*8; 
The usage of R depends on the scaled mv . . . 
If (mv_hor_scaled && mv_ver_scaled are both sub-pel) 

{predicted__pel = (f ilter_cof (0) *pre_pel (0) + ... + 
f ilter_cof (15) *pre_pel (15) + (R<<4) ) >>16; } 
Else if (mv_hor_scaled || mv_ver_scaled one is sub_pel) 

{predicted_pel= (f ilter_cof (0) *pre_pel (0) + ... + 
f ilter_cof (3) *pre_pel (3) + (R>>4) ) >>8; } 
Else 

predicted___pel = pre_pel(0); 
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As can be seen, R is scaled to match the number to be 
derived. For example, only one sub_pel scaled motion 
vector and one half_pel non-scaled motion vector R=256* 
(12-8r) = 256 * 12 or 256 * 4. When this number is 
5 scaled with >> 4, it can assume values of 192 or 64 (x + 

64) >>8 then have probabilities to be rounded upwards, 
downwards or not rounded at all. These probabilities 
i,Q shall match corresponding probabilities in the non- scaled 

U case as well as possible, which means that the long-term 

I;q 

□ 10 amount of up and down rounding shall be the same, 

jjj FIGURE 4 is a flow chart illustrating the decoding 

method according to a preferred embodiment of the present 

tats? 

!in invention. 

! y 

!;L[ First,, the compressed video bit stream from, for 

i jl 

r *~" 15 example, transmitter apparatus 102 is received by the 

receiver apparatus 104 for processing by the processing 
circuitry 107 thereof as shown by block 150. If the bit 
stream corresponds to a video signal having a resolution 
which is the same as the resolution of the display unit 
20 112 at the receiver location (NO output of decision block 

152) , the signal is decoded 154 and ultimately used to 
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display a picture on the display unit as illustrated by 
block 156. 

If the bit stream corresponds to a resolution which 
is higher than the resolution of the display unit 102 
(YES output of decision block 152, the signal is first 
downscaled (block 158) and then decoded (block 160) 
before being used to display the lower resolution picture 
on the display unit as shown in block 156 . 

While what has been described herein constitutes 
presently most preferred embodiments of the invention, it 
should be recognized that the invention could take 
numerous other forms. Accordingly, it should be 
understood that the invention is to be limited only 
insofar as is required by the scope of the following 
claims . 
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